feat: maximum Yandex Dialogs platform integration (Phases 0–2)#18
Merged
feat: maximum Yandex Dialogs platform integration (Phases 0–2)#18
Conversation
Wires four free-tier Yandex Dialogs platform features into the webhook without adding any external dependency: - request.markup.dangerous_context: graceful refusal with end_session=true before NLU/search engages, so flagged content never lands in mass.music.search. - meta.interfaces.screen: buttons in the disambiguation prompt are emitted only on screened surfaces; voice-only devices (Mini/Pro) get the same ordinal prompt without button payload. - request.original_utterance: logged alongside the normalised command for misclassification post-mortems. - request.nlu.entities[YANDEX.NUMBER]: new ParsedControl action "volume_relative" handles "прибавь на 20" / "убавь 5" / "на 15 громче" with regex-captured digits or entity fallback. Executor reads current player.volume_level, applies signed delta, clamps [0, 100], dispatches cmd_volume_set. Bare "прибавь" / "убавь" without a number still resolve to volume_up / volume_down via the existing _CONTROL_PATTERNS rules. Player resolution and music search remain in-house (domain logic, not NLU-shaped). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three platform-aware response improvements landing without new dependencies, all gated on meta.interfaces.screen so voice-only surfaces (Mini, Pro) are unaffected: - card parameter plumbed through _yandex_response (BigImage / ItemsList / ImageGallery shapes documented). Actual emission deferred to Phase 1.5 — BigImage requires image_id of a pre-uploaded asset and per-track album art can't be uploaded inside the 3-second webhook budget. - Suggestion buttons (Следующая / Пауза / Громче / Тише) appended to play- and control-success responses on screened surfaces. Lets the user follow up by tap without re-saying the activation phrase. - TTS dictionary moved to provider/tts_dictionary.py with two tables: WORD_REPLACEMENTS (Russian stress hints + ~26 foreign single-word artist transliterations) and PHRASE_REPLACEMENTS (16 multi-word bands, applied before the per-word regex). _tts_for now matches both Latin and Cyrillic so "Включаю Metallica" emits tts="Включ+аю Мет+аллика". text stays clean. - voice_continuation toggle (CONF_DIALOG_VOICE_CONTINUATION, default off): when enabled, play- and control-success keep end_session=False for natural follow-ups. "стоп / выключи / выключи музыку" always closes the session regardless of the toggle. No UI surface yet — power-user knob for now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t.nlu.intents (Phase 2)
Delegates intent classification to Yandex's grammar engine for the closed
set of control commands and the my_wave play intent. The platform's
synchronously-classified `request.nlu.intents.<form_name>` block now takes
precedence in the webhook handler; the existing regex parsers
(parse_command / parse_control) remain as fallback for phrases that
don't match any declared grammar — so this is purely additive coverage,
no regression risk.
Eleven grammars ship in provider/dialogs_grammar.py:
control.{pause,resume,next,previous,stop,volume_up,volume_down,
shuffle_on,shuffle_off,now_playing}
play.my_wave
Each carries `positiveTests` for the dev-console "Протестировать" button
and uses %lemma where multi-word morphology matters. All grammar bodies
are conservative — Yandex's server-side validator catches malformed
sources synchronously and surfaces them as DialogsIntentValidationError,
so set_intents() will fail loud rather than silently deploying broken
NLU.
The intents pipeline runs between draft update and request_deploy, so
they land in the same moderation cycle as the rest of the draft (no
two-phase publish needed). Endpoints + payload shape were derived from
a Playwright probe of the live dev console on 2026-05-07; the
ya-dialogs-api>=2.1.0 dependency wraps the five new REST endpoints.
Bumps ya-dialogs-api>=2.1.0 in pyproject.toml + manifest.json.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Yandex automatically classifies four built-in intents (CONFIRM / REJECT /
HELP / REPEAT) once any custom grammar is declared on the skill — and we
declared eleven in the previous Phase 2 commit. This wires up the two
that have unambiguous behaviour in our flows:
- YANDEX.REJECT during a pending disambiguation prompt or slot-elicit
("На какой колонке?" / "Что включить?") → respond "Хорошо, отменил.",
clear pending state, end session. Outside of those prompts the intent
falls through to normal command parsing — "отмена" without context is
not a free-standing app-cancel signal.
- YANDEX.HELP → contextual hint matching the current prompt: re-explain
how to answer the disambiguation, suggest example queries during slot
elicit, or surface a generic "включи рок на кухне" example otherwise.
State is preserved so the user can answer the original question next.
CONFIRM and REPEAT are deferred:
- CONFIRM is ambiguous in our flows (which player is the user agreeing
to? — we have no canonical "yes" target).
- REPEAT requires caching the last response on session_state.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upstream Music Assistant CLAUDE.md mandates Sphinx-style docstrings (`:param:` syntax) and explicitly bans Google-style (`Args:`) and bullet-style (`- param:`) — flagged in PR #3843 review by @chrisuthe. This commit: 1. Adds a CLAUDE.md at the repo root that mirrors upstream's relevant sections (Behaviour, Code Style, Branching) and adapts the rest to this provider repo's specifics (sync workflow, provider/ layout, pre-commit gate, debugging via $HOME/.musicassistant). Cross-refs the Copilot review findings (is_public_https_url for any new network input) so future contributors don't re-introduce the same bug. 2. Converts the six Google-style docstring sections that had crept in (auth_page.py, auth_session.py, dialog_skill_meta.py, dialogs.py, dialogs_control.py, dialogs_nlu.py) to Sphinx-style. No behaviour change. The webhook-handler error-handling concern from the same review thread is mentioned in CLAUDE.md as a known follow-up but not addressed here — that's a separate code change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per @chrisuthe's review of upstream PR #3843: only `request.json()` was guarded; an unexpected raise from a parser, the resolver, or MA dispatch bubbled to aiohttp → HTTP 500 → Alice silence on the user's device. Refactors `_handle_webhook` so the post-auth body lives in a new `_handle_authenticated_request` method, called inside a `try / except` that catches any non-CancelledError exception, logs it, and returns a generic Russian fallback ("Что-то пошло не так. Попробуй ещё раз.") with `end_session=False` so the conversation can continue. The original exception is still emitted via `_logger.exception` so operators can debug from `$HOME/.musicassistant/musicassistant.log`. Adds a regression test that injects a RuntimeError into `mass.players.all_players` (deep inside the play-resolve path) and verifies the response is HTTP 200 with the Russian fallback text and `end_session=false` — not HTTP 500. CLAUDE.md updated to call out the contract so future branches don't regress it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 'sting' entry in tts_dictionary.py is the artist Стинг, not a typo of 'string'. Codespell flagged it on the PR #18 CI run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR expands the Yandex Dialogs integration for the Music Assistant Alice provider by consuming more of the request envelope, leveraging platform-side intent classification via custom grammars, improving TTS pronunciation, adding screen-only UX affordances, and hardening the webhook to avoid Alice “silence” on unexpected exceptions.
Changes:
- Webhook handler now supports screened-surface UI gating (buttons), dangerous-context refusal, built-in YANDEX intents, platform
request.nlu.intentsprecedence, and a post-auth try/except fallback response. - Control layer adds
volume_relativeparsing (regex +YANDEX.NUMBERentity fallback) and execution logic (read current volume, apply delta, clamp). - Skill provisioning now declares custom grammars via
ya-dialogs-api>=2.1.0; TTS substitutions moved into a dedicated dictionary module; tests expanded accordingly.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_dialogs.py | Adds coverage for webhook exception fallback, screen button gating, platform-intent dispatch precedence, built-in YANDEX intents, voice continuation, and TTS transliteration/phrase handling. |
| tests/test_dialogs_control.py | Adds parsing + execution tests for volume_relative (regex + entity fallback, clamping). |
| pyproject.toml | Bumps ya-dialogs-api to >=2.1.0. |
| provider/tts_dictionary.py | Introduces curated word/phrase replacement tables for TTS stress marks and foreign-name transliteration. |
| provider/plugin.py | Plumbs voice_continuation config into the webhook handler. |
| provider/manifest.json | Updates runtime requirement to ya-dialogs-api>=2.1.0. |
| provider/dialogs.py | Implements screen detection, suggestion buttons, dangerous-context refusal, platform intent mapping, built-in YANDEX intents, TTS phrase/word passes, and graceful exception fallback. |
| provider/dialogs_nlu.py | Docstring format updates (:returns:). |
| provider/dialogs_grammar.py | Adds skill grammar definitions and runtime mapping from request.nlu.intents to internal command/control types. |
| provider/dialogs_control.py | Adds volume_relative parsing (incl. YANDEX.NUMBER fallback) + execution and docstring format updates. |
| provider/dialog_skill_meta.py | Docstring format update (:raises:). |
| provider/constants.py | Adds CONF_DIALOG_VOICE_CONTINUATION and related commentary. |
| provider/auto_update.py | Supplies intents from build_grammar() during skill update pipeline. |
| provider/auto_create.py | Supplies intents from build_grammar() during skill creation pipeline. |
| provider/auth_session.py | Docstring format update (:raises:). |
| provider/auth_page.py | Docstring format update (:raises:). |
| CLAUDE.md | Adds aligned contributor guidance (commands, docstrings, validation, handler error-handling guarantees). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Three review threads from Copilot's pass on PR #18: - **dialogs.py:580** — DEBUG "Webhook recv" line was emitting `cmd` and `original_utterance` *before* the dangerous_context refusal branch, leaking flagged content into $HOME/.musicassistant/musicassistant.log. Both fields are now replaced with `<redacted: dangerous_context>` when the flag is set; the rest of the structured fields stay intact so operators still see traffic shape. Regression test injects a flagged phrase and asserts it's absent from caplog records. - **dialogs_control.py:318** — `volume_relative` clamped magnitude with `max(1, …)`, silently promoting "прибавь на 0" to a +1 bump. Clamp is now `max(0, …)` so the parsed delta matches the spoken number; zero is a valid no-op. Parametrised test covers all four phrasings. - **constants.py:119** — comment promised "спасибо" closes the session via the `stop` control intent, but parse_control does not match it. Corrected to the actually-matched phrases (стоп / останови / выключи / выключи музыку). Pure doc fix. Plus: pin `ya-dialogs-api==2.1.0` in provider/manifest.json (== rather than >=) so MA installs the exact version the provider was tested against. Bumps VERSION to 1.3.0 with a comprehensive CHANGELOG entry covering the full Phase 0–2 work landing in PR #18 plus these three Copilot fixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Six commits delivering the four-phase plan from
docs/NLU_RESEARCH.md: consume the rest of the Yandex Dialogs request envelope, polish responses for screened surfaces, delegate intent classification to the platform via custom grammar, and harden the webhook against inner-dispatch exceptions. Zero new runtime dependencies on this side; bumpsya-dialogs-api>=2.1.0(released to PyPI separately) for the new intent CRUD surface.3a13d0fmeta.interfaces(gate buttons on screen),request.markup.dangerous_context(graceful refusal),request.nlu.entities[YANDEX.NUMBER](newvolume_relativeaction: «прибавь на 20» / «убавь 5» / «на 15 громче»), logrequest.original_utterancefor misclassification post-mortemsf052e6fcardparameter plumbing, suggestion buttons (Следующая / Пауза / Громче / Тише) on play/control success on screened surfaces,provider/tts_dictionary.pywith ~40 foreign band-name transliterations (single-word + multi-word phrases),voice_continuationopt-in toggle (end_session=Falseafter success)fdb24a8ya-dialogs-api 2.1.0IntentDraft API; runtime dispatcher readsrequest.nlu.intentsfirst, regex parsers remain as fallback for the long tail. BigImage card emission deferred — needs separate image-upload infrastructure745291fa2e7e12CLAUDE.mdaligned with upstream Music AssistantCLAUDE.md(Sphinx docstrings, sync workflow, network-input validation, debugging); converted six existing Google-style docstrings to Sphinxd2ce60atry / exceptso a parser/resolver/dispatch raise surfaces as a Russian fallback («Что-то пошло не так. Попробуй ещё раз.») instead of HTTP 500 → Alice silence (per @chrisuthe review on upstream #3843)Resolves all four review threads from upstream music-assistant/server#3843 (two Copilot bot, two from @chrisuthe).
Test plan
pytest tests/— 461 passed (was 411 pre-branch)ruff check provider/ tests/— cleanruff format --check— cleanmypy provider/— clean (pre-existingplugin.py:33warning unrelated)🤖 Generated with Claude Code